{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "671e9ec1-fa00-4c92-a2fb-ceb142168ea9",
   "metadata": {},
   "source": [
    "# JaguarDB Vector Database\n",
    "\n",
    ">[JaguarDB Vector Database](http://www.jaguardb.com/windex.html\n",
    ">\n",
    ">1. It is a distributed vector database\n",
    ">2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability\n",
    ">3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial\n",
    ">4. All-masters: allows both parallel reads and writes\n",
    ">5. Anomaly detection capabilities\n",
    ">6. RAG support: combines LLM with proprietary and real-time data\n",
    ">7. Shared metadata: sharing of metadata across multiple vector indexes\n",
    ">8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a87dc28-1344-4003-b31a-13e4cb71bf48",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "There are two requirements for running the examples in this file.\n",
    "1. You must install and set up the JaguarDB server and its HTTP gateway server.\n",
    "   Please refer to the instructions in:\n",
    "   [www.jaguardb.com](http://www.jaguardb.com)\n",
    "\n",
    "2. You must install the http client package for JaguarDB:\n",
    "   ```\n",
    "       pip install -U jaguardb-http-client\n",
    "   ```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7d56993-4809-4e42-a409-94d3a7305ad8",
   "metadata": {},
   "source": [
    "## RAG With Langchain\n",
    "\n",
    "This section demonstrates chatting with LLM together with Jaguar in the langchain software stack.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d62c2393-5c7c-4bb6-8367-c4389fa36a4e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import TextLoader\n",
    "from langchain_community.vectorstores.jaguar import Jaguar\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "from langchain_text_splitters import CharacterTextSplitter\n",
    "\n",
    "\"\"\" \n",
    "Load a text file into a set of documents \n",
    "\"\"\"\n",
    "loader = TextLoader(\"../../how_to/state_of_the_union.txt\")\n",
    "documents = loader.load()\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=300)\n",
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "\"\"\"\n",
    "Instantiate a Jaguar vector store\n",
    "\"\"\"\n",
    "### Jaguar HTTP endpoint\n",
    "url = \"http://192.168.5.88:8080/fwww/\"\n",
    "\n",
    "### Use OpenAI embedding model\n",
    "embeddings = OpenAIEmbeddings()\n",
    "\n",
    "### Pod is a database for vectors\n",
    "pod = \"vdb\"\n",
    "\n",
    "### Vector store name\n",
    "store = \"langchain_rag_store\"\n",
    "\n",
    "### Vector index name\n",
    "vector_index = \"v\"\n",
    "\n",
    "### Type of the vector index\n",
    "# cosine: distance metric\n",
    "# fraction: embedding vectors are decimal numbers\n",
    "# float: values stored with floating-point numbers\n",
    "vector_type = \"cosine_fraction_float\"\n",
    "\n",
    "### Dimension of each embedding vector\n",
    "vector_dimension = 1536\n",
    "\n",
    "### Instantiate a Jaguar store object\n",
    "vectorstore = Jaguar(\n",
    "    pod, store, vector_index, vector_type, vector_dimension, url, embeddings\n",
    ")\n",
    "\n",
    "\"\"\"\n",
    "Login must be performed to authorize the client.\n",
    "The environment variable JAGUAR_API_KEY or file $HOME/.jagrc\n",
    "should contain the API key for accessing JaguarDB servers.\n",
    "\"\"\"\n",
    "vectorstore.login()\n",
    "\n",
    "\n",
    "\"\"\"\n",
    "Create vector store on the JaguarDB database server.\n",
    "This should be done only once.\n",
    "\"\"\"\n",
    "# Extra metadata fields for the vector store\n",
    "metadata = \"category char(16)\"\n",
    "\n",
    "# Number of characters for the text field of the store\n",
    "text_size = 4096\n",
    "\n",
    "#  Create a vector store on the server\n",
    "vectorstore.create(metadata, text_size)\n",
    "\n",
    "\"\"\"\n",
    "Add the texts from the text splitter to our vectorstore\n",
    "\"\"\"\n",
    "vectorstore.add_documents(docs)\n",
    "\n",
    "\"\"\" Get the retriever object \"\"\"\n",
    "retriever = vectorstore.as_retriever()\n",
    "# retriever = vectorstore.as_retriever(search_kwargs={\"where\": \"m1='123' and m2='abc'\"})\n",
    "\n",
    "\"\"\" The retriever object can be used with LangChain and LLM \"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11178867-d143-4a10-93bf-278f5f10dc1a",
   "metadata": {},
   "source": [
    "## Interaction With Jaguar Vector Store\n",
    "\n",
    "Users can interact directly with the Jaguar vector store for similarity search and anomaly detection.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9a53cb5-e298-4125-9ace-0d851198869a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.vectorstores.jaguar import Jaguar\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "\n",
    "# Instantiate a Jaguar vector store object\n",
    "url = \"http://192.168.3.88:8080/fwww/\"\n",
    "pod = \"vdb\"\n",
    "store = \"langchain_test_store\"\n",
    "vector_index = \"v\"\n",
    "vector_type = \"cosine_fraction_float\"\n",
    "vector_dimension = 10\n",
    "embeddings = OpenAIEmbeddings()\n",
    "vectorstore = Jaguar(\n",
    "    pod, store, vector_index, vector_type, vector_dimension, url, embeddings\n",
    ")\n",
    "\n",
    "# Login for authorization\n",
    "vectorstore.login()\n",
    "\n",
    "# Create the vector store with two metadata fields\n",
    "# This needs to be run only once.\n",
    "metadata_str = \"author char(32), category char(16)\"\n",
    "vectorstore.create(metadata_str, 1024)\n",
    "\n",
    "# Add a list of texts\n",
    "texts = [\"foo\", \"bar\", \"baz\"]\n",
    "metadatas = [\n",
    "    {\"author\": \"Adam\", \"category\": \"Music\"},\n",
    "    {\"author\": \"Eve\", \"category\": \"Music\"},\n",
    "    {\"author\": \"John\", \"category\": \"History\"},\n",
    "]\n",
    "ids = vectorstore.add_texts(texts=texts, metadatas=metadatas)\n",
    "\n",
    "#  Search similar text\n",
    "output = vectorstore.similarity_search(\n",
    "    query=\"foo\",\n",
    "    k=1,\n",
    "    metadatas=[\"author\", \"category\"],\n",
    ")\n",
    "assert output[0].page_content == \"foo\"\n",
    "assert output[0].metadata[\"author\"] == \"Adam\"\n",
    "assert output[0].metadata[\"category\"] == \"Music\"\n",
    "assert len(output) == 1\n",
    "\n",
    "# Search with filtering (where)\n",
    "where = \"author='Eve'\"\n",
    "output = vectorstore.similarity_search(\n",
    "    query=\"foo\",\n",
    "    k=3,\n",
    "    fetch_k=9,\n",
    "    where=where,\n",
    "    metadatas=[\"author\", \"category\"],\n",
    ")\n",
    "assert output[0].page_content == \"bar\"\n",
    "assert output[0].metadata[\"author\"] == \"Eve\"\n",
    "assert output[0].metadata[\"category\"] == \"Music\"\n",
    "assert len(output) == 1\n",
    "\n",
    "# Anomaly detection\n",
    "result = vectorstore.is_anomalous(\n",
    "    query=\"dogs can jump high\",\n",
    ")\n",
    "assert result is False\n",
    "\n",
    "# Remove all data in the store\n",
    "vectorstore.clear()\n",
    "assert vectorstore.count() == 0\n",
    "\n",
    "# Remove the store completely\n",
    "vectorstore.drop()\n",
    "\n",
    "# Logout\n",
    "vectorstore.logout()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
